AITopics | scene synthesis

SCENEWEAVER: All-in-One 3DScene Synthesis with an Extensible and Self-Reflective Agent

Neural Information Processing SystemsJun-22-2026, 19:28:19 GMT

Indoor scene synthesis has become increasingly important with the rise of Embodied AI, which requires 3D environments that are not only visually realistic but also physically plausible and functionally diverse. While recent approaches have advanced visual fidelity, they often remain constrained to fixed scene categories, lack sufficient object-level detail and physical consistency, and struggle to align with complex user instructions.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

DeBaRA: Denoising-Based 3D Room Arrangement Generation

Neural Information Processing SystemsMar-22-2026, 10:36:19 GMT

Generating realistic and diverse layouts of furnished indoor 3D scenes unlocks multiple interactive applications impacting a wide range of industries. The inherent complexity of object interactions, the limited amount of available data and the requirement to fulfill spatial constraints all make generative modeling for 3D scene synthesis and arrangement challenging. Current methods address these challenges autoregressively or by using off-the-shelf diffusion objectives by simultaneously predicting all attributes without 3D reasoning considerations. In this paper, we introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment. We argue that the most critical component of a scene synthesis system is to accurately establish the size and position of various objects within a restricted area. Based on this insight, we propose a lightweight conditional score-based model designed with 3D spatial awareness at its core. We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement. Further, we introduce a novel Self Score Evaluation procedure so it can be optimally employed alongside external LLM models. We evaluate our approach through extensive experiments and demonstrate significant improvement upon state-of-the-art approaches in a range of scenarios.

artificial intelligence, name change, proceedings, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.99)

Add feedback

DeBaRA: Denoising-Based3DRoomArrangement Generation

Neural Information Processing SystemsFeb-18-2026, 01:20:20 GMT

Weargue that the most critical component of a scene synthesis system is to accurately establish thesizeandpositionofvarious objects within arestricted area.

Add feedback

3a7f9e485845dac27423375c934cb4db-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 12:15:54 GMT

exemplar, in-context exemplar, layoutgpt, (14 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Neural Information Processing SystemsFeb-10-2026, 12:15:51 GMT

However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual generative models. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance the visual planning skills of LLMs.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

64986d86a17424eeac96b08a6d519059-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 01:23:34 GMT

international conf, proc, transformer, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Text-to-Scene with Large Reasoning Models

Berdoz, Frédéric, Lanzendörfer, Luca A., Tuninga, Nick, Wattenhofer, Roger

arXiv.org Artificial IntelligenceNov-14-2025

Prompt-driven scene synthesis allows users to generate complete 3D environments from textual descriptions. Current text-to-scene methods often struggle with complex geometries and object transformations, and tend to show weak adherence to complex instructions. We address these limitations by introducing Reason-3D, a text-to-scene model powered by large reasoning models (LRMs). Reason-3D integrates object retrieval using captions covering physical, functional, and contextual attributes. Reason-3D then places the selected objects based on implicit and explicit layout constraints, and refines their positions with collision-aware spatial reasoning. Evaluated on instructions ranging from simple to complex indoor configurations, Reason-3D significantly outperforms previous methods in human-rated visual fidelity, adherence to constraints, and asset retrieval quality. Beyond its contribution to the field of text-to-scene generation, our work showcases the advanced spatial reasoning abilities of modern LRMs. Additionally, we release the codebase to further the research in object retrieval and placement with LRMs.

large language model, machine learning, reason-3d, (19 more...)

arXiv.org Artificial Intelligence

2509.26091

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yang, Yandan, Jia, Baoxiong, Zhang, Shujie, Huang, Siyuan

arXiv.org Artificial IntelligenceOct-28-2025

Indoor scene synthesis has become increasingly important with the rise of Embodied AI, which requires 3D environments that are not only visually realistic but also physically plausible and functionally diverse. While recent approaches have advanced visual fidelity, they often remain constrained to fixed scene categories, lack sufficient object-level detail and physical consistency, and struggle to align with complex user instructions. In this work, we present SceneWeaver, a reflective agentic framework that unifies diverse scene synthesis paradigms through tool-based iterative refinement. At its core, SceneWeaver employs a language model-based planner to select from a suite of extensible scene generation tools, ranging from data-driven generative models to visual- and LLM-based methods, guided by self-evaluation of physical plausibility, visual realism, and semantic alignment with user input. This closed-loop reason-act-reflect design enables the agent to identify semantic inconsistencies, invoke targeted tools, and update the environment over successive iterations. Extensive experiments on both common and open-vocabulary room types demonstrate that SceneWeaver not only outperforms prior methods on physical, visual, and semantic metrics, but also generalizes effectively to complex scenes with diverse instructions, marking a step toward general-purpose 3D environment generation. Project website: https://scene-weaver.github.io/.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.20414

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

DeBaRA: Denoising-Based 3D Room Arrangement Generation Léopold Maillard

Neural Information Processing SystemsOct-10-2025, 16:05:26 GMT

We argue that the most critical component of a scene synthesis system is to accurately establish the size and position of various objects within a restricted area.

computer vision and pattern recognition, layout, scene synthesis, (11 more...)

Neural Information Processing Systems

Country: Asia (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supplementary Material for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Anonymous Author(s) Affiliation Address email A Implementation Details 1

Neural Information Processing SystemsOct-8-2025, 11:53:33 GMT

Table 1: The prepending instructions provided to GPT -3.5/4 during our LayoutGPT's 2D and 3D T ask Instruction for GPT -3.5/4 2D Layout Planning Instruction: Given a sentence prompt that will be used to generate an image, plan the layout of the image. Formally, each line should be like "object {width:?px; height:?px; left:?px; top:?px; }". Formally, each line should follow the template: FURNITURE {length:?px:

exemplar, in-context exemplar, layoutgpt, (13 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Filters

Collaborating Authors

scene synthesis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SCENEWEAVER: All-in-One 3DScene Synthesis with an Extensible and Self-Reflective Agent

DeBaRA: Denoising-Based 3D Room Arrangement Generation

DeBaRA: Denoising-Based3DRoomArrangement Generation

3a7f9e485845dac27423375c934cb4db-Supplemental-Conference.pdf

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

64986d86a17424eeac96b08a6d519059-Paper.pdf

Text-to-Scene with Large Reasoning Models

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

DeBaRA: Denoising-Based 3D Room Arrangement Generation Léopold Maillard

Supplementary Material for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Anonymous Author(s) Affiliation Address email A Implementation Details 1